Goto

Collaborating Authors

 test performance


OntheAccuracyofInfluenceFunctions forMeasuringGroupEffects

Neural Information Processing Systems

Influence functions estimate the effect of removing a training point on a model without theneedtoretrain. Theyarebasedonafirst-order Taylorapproximation thatisguaranteed tobeaccurate forsufficiently small changes tothemodel, and so are commonly used to study the effect of individual points in large datasets. However, we often want to study the effects of largegroups of training points, e.g., todiagnose batch effects orapportion credit between different data sources.










0be50b4590f1c5fdf4c8feddd63c4f67-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing Systems

In Figure 1 we demonstrate the common neighbor (CN) distribution among positive and negative test samples for ogbl-collab, ogbl-ppa, and ogbl-citation2. These results demonstrate that a vast majority of negative samples have no CNs. Since CNs is a typically good heuristic, this makes it easy to identify most negative samples. We further present the CN distribution of Cora, Citeseer, Pubmed, and ogbl-ddi in Figure 3. The CN distribution of Cora, Citeseer, and Pubmed are consistent with our previous observations on the OGB datasets in Figure 1. We note that ogbl-ddi exhibits a different distribution with other datasets. As compared to the other datasets, most of the negative samples in ogbl-ddi have common neighbors. This is likely because ogbl-ddi is considerably denser than the other graphs.